Alignment-free Sequence Analysis Using Extensible Markov Models

نویسندگان

  • Rao M. Kotamarti
  • Margaret H. Dunham
چکیده

Profile models based on Hidden Markov Models (HMM) for sequence studies have gained visibility among researchers. While the mathematical foundation, the proven algorithms such as Viterbi, Forward and Backward algorithms have certainly provided a rigorous probabilistic platform, the requirement of classic alignment has ensured an extremely high time complexity. We propose the use of another kind of Markov model called Extensible Markov Models (EMM) to create profile architectures that are more efficient in space and time complexity than their HMM counter parts. EMM efficiency comes from an alignment-free paradigm through use of an improved statistical signature form of sequences. The EMM aproach is based on the use sliding p-mers that count every possible p-mer pattern along equal sized segments of a sequence which are then clustered into Markov states. The resulting count vectors shift the position based letter-by-letter sequence analysis problem for phylogenetic trees, classification and search to a more efficient numerical vector space. Using adapted Karlin-Altschul statistics from the Basic Local Alignment Search Tool (BLAST) literature, the EMM based sequence classification also computes a p-value for statistical significance. We present a comparison between profiles generated using profile HMM and EMM.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing taxonomic classification using extensible Markov models

MOTIVATION As next generation sequencing is rapidly adding new genomes, their correct placement in the taxonomy needs verification. However, the current methods for confirming classification of a taxon or suggesting revision for a potential misplacement relies on computationally intense multi-sequence alignment followed by an iterative adjustment of the distance matrix. Due to intra-heterogenei...

متن کامل

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

xREI: a phylo-grammar visualization webserver

Phylo-grammars, probabilistic models combining Markov chain substitution models with stochastic grammars, are powerful models for annotating structured features in multiple sequence alignments and analyzing the evolution of those features. In the past, these methods have been cumbersome to implement and modify. xrate provides means for the rapid development of phylo-grammars (using a simple fil...

متن کامل

An Analysis of Continuous Time Markov Chains using Generator Matrices

This paper mainly analyzes the applications of the Generator matrices in a Continuous Time Markov Chain (CTMC). Hidden Markov models [HMMs] together with related probabilistic models such as Stochastic Context-Free Grammars [SCFGs] are the basis of many algorithms for the analysis of biological sequences. Combined with the continuous-time Markov chain theory of likelihood based phylogeny, stoch...

متن کامل

Using evolutionary Expectation Maximization to estimate indel rates

MOTIVATION The Expectation Maximization (EM) algorithm, in the form of the Baum-Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010